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1 Introduction 



Still there are many open question in connection with the random field Ising model - 
RFIM — (see ref. [1] for a recent review on this subject). It has been shown rigorously 
that in more than two dimensions the RFIM possesses a second order phase transition 
at finite temperature for small enough field strength. Nevertheless there is still much 
uncertainty concerning the exponents characterizing this transition in three dimensions. 
Results of computer-simulations and experimental data seem to contradict each other if 
one tries to harmonize them with a proposed scaling theory for it (see [1] for details). We 
have the impression that more extensive simulation might help to clarify this situation. 

As a first step into this direction we present here an effective algorithm that can 
perform Monte Carlo simulations of the RFIM with a speed of 184 Milllion spin updates 
(MUPS) per second on one processor of a Cray YMP. It was developed out of the fast 
vectorized algorithm for the simulation of the three dimensional Ising model, which was 
originally invented by N. Ito [2], reaching a speed of 2190 MUPS on a Fujitsu VP 2600/10 
and 800 MUPS on the NEC-supercomputer. Later it was improved by H. O. Heuer [3] 
and implemented on a Cray YMP, where it reached a speed of 305 MUPS on one processor 
of a Cray YMP. 

The main idea we followed in the construction of the code is to consider the two 
cases "spin parallel/antiparallel to the external field" separately. Therefore (in the case of 
a binary distribution for the fields) one cannot do worse than double the time needed for 
the innermost loop in the RFIM in comparison to the pure case considered in [3] . Taking 
into account the fact that we use periodic boundary conditions instead of helical or self 
consistent boundary conditions (which gives a slowing down of approximately 10%) our 
code has surpassed this minimal requirement by about 30%. The speed of our algorithm 
has to be compared with the following data: 8 years ago the RFIM was simulated with 
a speed of 1 MUPS on a CDC 176 [4] and one year later the distributed array processor 
(DAP) at Queen Mary College, London was able to update 14.6 Million spins per second 
[5]. 

The exact definition of the model that the algorithm is able to simulate is as follows. 
We consider a simple cubic lattice of linear dimension L with N = L * L * L Ising Spins 
Si = ±1. The Hamiltonian of the RFIM is 

H = -jj2s t s J -J2h l s l , (i) 

(ij) i 



2 



where the first sum extends over all nearest neighbor pairs (ij) and the second sum over 
all sites. By rescaling the temperature we confine ourselve to the case J = 1. The external 
fields hi are random variables obeying a binary probability distribution 

P(hi) = p ■ S(hi -h) + (l-p)- S(hi + h) , (2) 

where p G [0, 1]. Most of the literature deals with p = 1/2. Note that p = 1 or p = yields 
the 3-d Ising model in a homogeneous external field. For the version of the algorithm we 
present in this paper the field strength h has to be smaller than 2, i.e. h G [0, 2], for higher 
field values slight modifications have to be incorporated. If one restricts oneself to field 
strength smaller than one (h < 1), an even faster (242 Million spin-updates per second) 
version of the algorithm can be used, which is described in appendix A. 

Since much of the code is a straightforward generalization of Heuer's algorithm [3] 
we do not spend too much time in explaining the bulk of it. Only the innermost loop has 
to be described in detail, which is done in section 2. Furthermore we want to focus some 
attention on the following point. The algorithm uses multispin coding and simulates 64 
different systems at once. The essential speed-up in comparison to older algorithms is 
achieved by using the same random number for several systems. In the 3-d Ising model 
one has to simulate the 64 systems at different temperatures, otherwise one would run 
into difficulties, since all 64 systems are identical, apart from the initial condition. In 
our case, the RFIM, we deal with 64 physically different systems, because of the different 
realizations of the disorder, i.e. random field configurations. This is very convenient, 
since at the end of the simulations we have to perform the average over many disorder 
realizations anyway. Therefore we choose the same temperature for each system and 
collect at the end the data for magnetization, energy etc., until the desired number of 
realizations (most conveniently a multiple of 64) is reached. 

A few words to the whole program: First one initializes the random number genera- 
tor (which is a shift-register RNG a la Tausworth [6]), then the initial spin-configurations 
of all 64 systems (each bit in a computer word corresponds to one spin in one system). 
Now the random field configurations are generated — a bit in a computer word is one 
if the random field at a particular site of one system is positive, otherwise it is zero. 
Furthermore one has to initialize the demon-arrays described below and also the nearest- 
neighbor arrays. We use periodic boundary conditions since then finite size scaling is 
expected to be more easy. To achieve vectorization one has to split the whole system into 
two sublattices of size N/2. Therefore the linear dimension L of the cube has to be an 
even number. 
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After the initializations everything is set to perform the update algorithm according 
to Metropolis, where a spin is flipped with a probability 

w flip = min{exp(-/3A£), 1} , AE = E(-S t ) - . (3) 

The subroutine sweep described in section 2 updates sequentially all spins in one sublat- 
tice. Hence for each MC-step the routine sweep is called twice: once for the update of the 
first sublattice and once for the second. Therefore one needs also two different nearest- 
neighbor arrays for the x-direction in the different sublattices. For measurements one 
needs an effective bit-counting routine, which can be found in [3]. 

2 Description of the innermost loop 

The innermost loop is listed in figure 1. It is written in Fortran 77 and vectorizes on 
the Cray YMP. We tried to use a similar notation as in the literature [2,3,7], to make 
comparisons easier. Because we use a recursive algorithm to generate the random numbers 
the compiler directive "ignore vector dependency" IVDEP occurs. Since the vector length 
of the Cray computer systems is 64 this directive has no influence on the correctness of 
the code (see [3] for details). 

The subroutine sweep updates the spins s(i) (i = 1,...,N) of one sublattice, 
therefore N equals one half the total number of spins. The neighboring spins in the second 
sublattice are stored in sneighbO, and the random fields are stored in h(). The integers 
nxm ( i ) , nxp ( i ) nym ( i ) , nyp ( i ) , nzm ( i ) and nzp ( i ) are the indices from the six nearest 
neighbors of s(i) (note that we use periodic boundary conditions). The array ir() 
contains 256 random integers between and irlst, each composed of irbit random 
bits. It is the essential part of the shift-register random number generator. The function 
of the demon-arrays ixpl(), ixp2(), ixp3(), ixml() and ixm2() will be described later. 

Let us assume that the arrays s(), sneighbO and h() are bit-arrays — ignoring 
the fact that we deal with 64 bits at once and in parallel. The six bits il, . . . , i6 (lines 
13-18) contain the information about the orientation of spin s(i) with respect to its six 
neighbors. The bit is one if s(i) is antiparallel to the corresponding neighbor and it is 
zero otherwise. The bit ih (line 19) is one if the spin s(i) and the random field h(i) are 
pointing in the opposite direction. 

In lines 20-30 the number of antiparallel pairs £ = il + . . . + i6 is calculated. The 
summation is done by adding il, i2 and i3 first and storing the result in binary code 
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into the two bits j 1 and j2 (il + ±2 + i3 = 2 1 * j2 + 2° * jl, see lines 20-22) and then 
adding i4, i5 and i6, which yields j3 and j4 (see lines 23-25). In lines 26-30 the results 
of these two summation is added and since it is a number between and 6 one needs 
three bits (b3, b2, bl) to store this information in binary code. Their meaning is given by 
E = 2 2 *b3 + 2 1 *b2 + 2°*bl. 

The random number irt, which is needed for the update of all 64 spins s(i) in the 
different systems, is generated in lines 31-34. To save memory the random numbers lie 
on a wheel with 256 spokes, which is enforced by the periodic neighbor array irndxO 
(see [3] for details). 

To make a flip-decision one has now to consider the possibilities ih = (spin and 
random field pointing into the same direction) and ih = 1 (spin and random field pointing 
in opposite directions) separately. The case ih = is considered first (lines 35-41). A look 
at table 1 may help to explain the procedure. There we defined po = exp[— (12 + 2h)f3], 
Pi = exp[— (8 + 2h)/3], pi = exp[— (4 + 2h)/3] and p 3 = exp[— 2h/3]. One reads the table in 
the following way: If for instance E = 0, all six nearest neighbors are parallel to spin s(i) 
and the energy difference to the state with s(i) flipped would be AE = 12 + 2h, which 
means that s (i) should be flipped with a probability p = exp[— (12 + 2/i)/3]. Analogously 
for E = 1, . . . , 6. 
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AE 


flip-prob. 


ixp 





12 + 2/i 


Po 
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1 


8 + 2/i 


Pi 
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4 + 2/i 


P2 


2 


3 


2h 


P3 


1 


4 


< 


1 





5 


< 


1 





6 


< 


1 






Table 1 



Here it is important that the field strength h is not greater than 2, otherwise also the 
energy differnce AE in the case E = 4 would be positive (AE = — 4 + 2h > for h > 2), 
which would result in a flip probability smaller than one. Nevertheless for h > 2 the 
algoritm can easily be modified. In fact one has to change the algorithm for each of the 
cases h G [2,4], h G [4,6] and h > 6 in a different way, but for physical reasons field 
strength larger than two are not advisable to simulate. 
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Now we construct a flip-bit idO by adding to the sum £ a demon-number ixp G 
{0, 1, . . . , 4}, which is composed by 3 demon-bits ixl, ix2 and ix3 via ixp = 2 2 * ix3 + 
2 1 * ix2 + 2°* ixl. If (11 + ixp) > 4, then the spin has to be flipped, which means idO = 1, 
otherwise idO = 0. Certainly idO = 1 if b3 is one (line 40), also if ix3 is one (line 41). If 
b3 and ixp3 are both zero there is still the possibility that the following addition yields 
an overflow bit: 

( b2 bl ) 
+ ( ix2 ixl j (4) 
~ idO i i j~ 

which is done in lines 38-39. First (line 38) one checks, whether the addition of bl and 
ixl is greater than 1, which yields an overflow bit for the sum of the lowest bits if the 
two numbers £ and ixp. Then (line 39) one takes this overflow bit and adds it to the sum 
of b2 and ix2. This gives an overflow bit for the sum of the two smallest bits of each of 
the two numbers £ and ixp. If this overflow bit is one it means that the sum is greater 
than 4, as desired. 

From table 1 we also learn with which probabilities the demon-number ixp has to 
be set to the different values: it has to be 4 with probability po, 3 with probability pi — po, 
2 with probability p 2 — pi, 1 with probability p^ —p2 and with probability 1 — p^. If one 
works with a computer that has only a very limited memory capacity one can operate 
with if-instructions in an obvious way — here we use a time saving trick (but also memory 
gobbling, see [3]): We define the arrays ixpl(), ixp2() and ixp3() as shown in table 2, 
where irlst = 2 irblt — 1 and k = irlst*p , k\ = irlst * (pi — p Q ) , k 2 = irlst* (p 2 — Pi), 
k 3 = irlst * (p 3 — p 2 ). Note that the number of bits irbit determines the accuracy of 
the probabilities: here they have only irbit significant bits and therefore irbit should 
not be smaller than 17. 
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Table 2 



Remember that the shift-register random number generator in line 33 produces an integer 
irt uniformly distributed between and irlst and therefore ixp = 2 2 * ixp3(irt) + 2 1 * 
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ixp2(irt) + 2° * ixpl(irt) has the desired feature to be four with probability p Q , three 
with probability (p\ —po), etc. 

Now that we have calculated the flip-bit that has to be used in the case in = we 
come to the construction of the flip-bit idl for the possibility in = 1. First we compute 
E' = E + 1 (see lines 42-44 — the reason for this will soon be clear) and overwrite the 
bits b3, b2 and bl with this result in binary code. Now we might need table 3, where 
p' = exp[-(12 - 2h)p], p\ = exp[-(8 - 2h)(3] and p' 2 = exp[-(4 - 2h)(3}. 
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Table 3 



Once again it is important that the field strength h is smaller than 2, otherwise the 
probability p 2 would be greater than one. The table has to be read in complete analogy 
to table 1. Again we add a demon number ixm to E' and the flip-bit idl is one if 
(E' + ixm) > 4, which is certainly the case if b3 = 1 (line 49). Again idl is also one if 
the same addition as in (|]) yields an overflow-bit (lines 47-48). 

The arrays ixml () and ixm2 () are defined as shown in table 4, where k' = irlst*£>Q, 
k[ = irlst * (p[ —p'o) and k' 2 = irlst * (p' 2 — p[). Note that in contrast to the case in = 
here ixm ranges only from to 3 (which saves one array of length irlst) since we added 
already a one to E. 
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Table 4 
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Now we construct the proper spin-flip bit id (line 50) via 



id = 



idO 
idl 



if 
if 



ih = , 
ih = 1 , 



(5) 



which is slightly faster than id = xor(idO,and(ih,xor(idO, idl) ) ) proposed in [8] for 
the same operation. Finally the spin s(i) is flipped if and only if id is one (line 51). 
Instead of completing a cycle on the above mentioned wheel with 256 spokes for the RNG 
after each sweep through one sublattice we use a counter i counter that starts the wheel 
at the correct position each time the subroutine sweep is called. 

3 Summary 

We presented a fast vectorized algoritm for the MC-simulation of the three-dimensional 
random field Ising model. We would like to point out that this algorithm is also very 
fast on non-vector machines. In this case one can also dispense with the tricks that 
have to used to achieve vectorization (like the two sublattices). By making the necessary 
modifications for 32-bit integers one can implement it also very efficiently on special- 
purpose computers, parallel computers and transputer systems. In fact this is the way 
we used it to perform very extensive simulations of the RFIM - - the results of this 
investigation will be reported elsewhere [9]. 

Let us only mention that the equilibration time of the random field Ising systems 
depends strongly on the strength of the external fields. For small fields (where one can 
use the faster version of the algorithms peresented in this paper) the equilibration in finite 
systems is nearly as fast as in the pure Ising model — but very strong cross-over effects 
are expected and the separation of the critical behavior of the disordered system from the 
pure system is difficult. Therefore we prefered higher field strength (where one has to use 
the slightly slower algorithm) — but in this case the equilibration of the systems takes 
much longer, even for small system sizes. This makes the need of a fast algorithm for the 
MC-simulation of the RFIM obvious — at least as long no efficient cluster-algorithm is 
available. 
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Appendix 

Here we present a version of the algorithm that ca be used in the case of h < 1 and which 
is about 31% faster. The reason of this speed-up lies in the fact that one needs not to 
consider the two cases ih = and in = 1 separately. In fact for the version we want to 
present one has to replace lines 38-50 of the code in figure 1 by the following four lines: 



38' id = or ( ih, ixl ) 

39' id = xor ( and(bl,ix2) , and( id, xor(bl,ix2) ) ) 

40' id = xor ( and(b2,ix3) , and( id, xor(b2,ix3) ) ) 

41' id = or ( id, b3 ) 



The arrays ixml() and ixm2() are not needed anymore. Also the variable idO and 
idl are superfluous. To explain the above modifications we take a look at table 5, where 
we have defined 

E = 2*E + ih+l (6) 

and p t = exp[-/3A£(E)]. 
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We observe that only as long as h < 1 the flip-probabilities are monotonically increas- 
ing with X, which is necessary for the algorithm to work. To construct the flip-bit id we 
add to X a demon-number ix G {0, 1, . . . , 7}, which is composed of 3 demon bits ixl, ix2 
and ix3 via ix = 2 2 * ix3 + 2 1 * ix2 + 2° * ixl. If (X + ix) > 8 the spin has to be flipped, 
meaning id = 1. Obviously id = 1 if b3 = 1 (line 41'). The only other possibility for id 
to become one is an overflow-bit on the following addition: 

( b2 bl in ) 
+ ( ix3 ix2 ixl ) ^ 

( id • • • y 

which is done in lines 38'-40'. The first term in this sum corresponds to the lower three 
bits of X in binary representation — the bits b2, bl are shifted to the left because of the 
multiplication of X with two and the lowest bit is then occupied by in since it has to be 
added. Furthermore in the definition of X we added a one, which is the origin of the last 
term in the sum. The arrays ixpl(), ixp2() and ixp3() have to be defined as depicted 
in table 6, where k v = irlst * (p u — p u -\), (po = 0). 
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Table 6 



This modification of the demon arrays has of course to be done in the initialization 
procedures. 



10 



References 

[1] D. P. Belanger and A. P. Young, Journal of Magnetism and Magnetic Mate- 
rials 100 (1991) 272. 

[2] N. Ito and Y. Kanada, Supercomputer 25 (1988) 31. 

[3] H. O. Heuer, Comput. Phys. Commun. 59 (1990) 387. 

[4] D. Stauffer, C. Hartzstein, K. Binder and A. Aharony, Z. Phys. B55 (1984) 
325. 

[5] A. P. Young and M. Nauenberg, Phys. Rev. Let. 54 (1985) 2429. 

[6] R. Tausworth, Math. Comput. 19 (1965) 201. 

[7] E. Bhanot, D. Duke and R. Salvador, J. Stat. Phys. 44 (1986) 85. 

[8] F. Bagnoli, Int. J. Mod. Phys. 3 (1991) 307. 

[9] H. Rieger and A. P. Young, in preparation. 



11 



Figure 1 



1 subroutine sweep (s,sneighb,h,nxm,nxp,nym,nyp,nzm,nzp,N) 

2 integer s(N) , sneighb(N) ,h(N) 

3 integer nxm(N) ,nxp(N) ,nym(N) ,nyp(N) ,nzm(N) ,nzp(N) 

4 integer spin,bl ,b2 ,b3 

5 parameter (irbit=18) 

6 parameter (irlst = 2**irbit-l) 

7 common /rnddim/ ir(0:255) ,irndx(2, 0:255) ,icounter 

8 common /rngdl/ ixpl (0 : irlst) , ixp2(0 : irlst) , ixp3(0 : irlst) , 

9 1 ixml (0: irlst) ,ixm2(0: irlst) 

10 CDIR$ IVDEP 

11 do 100 i = 1,N 

12 spin = s(i) 

13 il = xor( spin, sneighb(nxm(i)) ) 

14 i2 = xor( spin, sneighb(nxp(i)) ) 

15 i3 = xor( spin, sneighb(nym(i)) ) 

16 i4 = xor( spin, sneighb(nyp(i)) ) 

17 i5 = xor( spin, sneighb(nzm(i)) ) 

18 i6 = xor( spin, sneighb(nzp(i)) ) 

19 in = xor( spin, h(i) ) 

20 j2 = xor( il, i2 ) 

21 jl = xor( j2, i3 ) 

22 j2 = xor( and( il, i2) , and( j2, i3 ) ) 

23 j4 = xor( i4, i5 ) 

24 j3 = xor( j4, i6 ) 

25 j4 = xor( and( i4, i5) , and( j4, i6 ) ) 

26 bl = and( jl, j3 ) 

27 b3 = xor( j2, j4 ) 

28 b2 = xor( b3, bl ) 

29 b3 = xor( and( j2, j4 ), and( b3, bl) ) 

30 bl = xor( jl, j3 ) 

31 index = i + icounter 

32 j = and( index, 255 ) 

33 irt = xor( ir (irndx(l , j ) ) , ir(irndx(2, j)) ) 

34 ir(j) = irt 
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35 ixl = ixpl(irt) 

36 ix2 = ixp2(irt) 

37 ix3 = ixp3(irt) 

38 idO = and( bl, ixl ) 

39 idO = xor( and(b2,ix2), and(xor (b2, ix2) , idO) ) 

40 idO = or( idO, b3 ) 

41 idO = or( idO, ix3 ) 

42 b3 = or( b3, and( b2, bl) ) 

43 b2 = xor( b2, bl ) 

44 bl = not ( bl ) 

45 ixl = ixml(irt) 

46 ix2 = ixm2(irt) 

47 idl = and( bl, ixl ) 

48 idl = xor( and(b2,ix2), and(xor(b2,ix2) ,idl) ) 

49 idl = or( idl, b3 ) 

50 id = or ( and(idO ,not (ih) ) , and (idl, in) ) 

51 s(i) = xor( s(i), id ) 

52 100 continue 

53 icounter = icounter + N 

54 return 

55 end 
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